An inverse problem is a general framework that is used to convert observed measurements into information about a physical object or system that we are interested in. For example, if we have measurements of the Earth's gravity field, then we might ask the question: "given the data that we have available, what can we say about the density distribution of the Earth in that area?" The solution to this problem (i.e. the density distribution that best matches the data) is useful because it generally tells us something about a physical parameter that we cannot directly observe. Thus, inverse problems are one of the most important, and well-studied mathematical problems in science and mathematics. Inverse problems arise in many branches of science and mathematics, including: computer vision, machine learning, statistics, statistical inference, geophysics, medical imaging (such as computed axial tomography and EEG/ERP), remote sensing, ocean acoustic tomography, nondestructive testing, astronomy, physics and many other fields.
Contents |
The field of inverse problems was first discovered and introduced by Soviet-Armenian physicist, Viktor Ambartsumian.[1][2]
While still a student, Ambartsumian thoroughly studied the theory of atomic structure, the formation of energy levels, and the Schrödinger equation and its properties, and when he mastered the theory of eigenvalues of differential equations, he pointed out the apparent analogy between discrete energy levels and the eigenvalues of differential equations. He then asked: given a family of eigenvalues, is it possible to find the form of the equations whose eigenvalues they are? Essentially Ambartsumian was examining the inverse Sturm–Liouville problem, which dealt with determining the equations of a vibrating string. This paper was published in 1929 in the German physics journal Zeitschrift für Physik and remained in oblivion for a rather long time. Describing this situation after many decades, Ambartsumian said, "If an astronomer publishes an article with a mathematical content in a physics journal, then the most likely thing that will happen to it is oblivion."
Nonetheless, toward the end of the Second World War, this article, written by the 20-year-old Ambartsumian, was found by Swedish mathematicians and formed the starting point for a whole area of research on inverse problems, becoming the foundation of an entire discipline.
The inverse problem can be conceptually formulated as follows:
The inverse problem is considered the "inverse" to the forward problem which relates the model parameters to the data that we observe:
The transformation from data to model parameters (or vice versa) is a result of the interaction of a physical system with the object that we wish to infer properties about. In other words, the transformation is the physics that relates the physical quantity (i.e. the model parameters) to the observed data.
The table below shows some examples of: physical systems, the governing physics, the physical quantity that we are interested, and what we actually observe.
Physical system | Governing equations | Physical quantity | Observed data |
---|---|---|---|
Earth's gravitational field | Newton's law of gravity | Density | Gravitational field |
Earth's magnetic field (at the surface) | Maxwell's equations | Magnetic susceptibility | Magnetic field |
Seismic waves (from earthquakes) | Wave equation | Wave-speed (density) | Particle velocity |
Linear algebra is useful in understanding the physical and mathematical construction of inverse problems, because of the presence of the transformation or "mapping" of data to the model parameters.
The objective of an inverse problem is to find the best model, , such that (at least approximately)
where is an operator describing the explicit relationship between the observed data, , and the model parameters. In various contexts, the operator is called forward operator, observation operator, or observation function. In the most general context, G represents the governing equations that relate the model parameters to the observed data (i.e. the governing physics).
In the case of a discrete linear inverse problem describing a linear system, and are vectors, and the problem can be written as
where is a matrix, often called the observation matrix.
Only a few physical systems are actually linear with respect to the model parameters. One such system from geophysics is that of the Earth's gravitational field. The Earth's gravitational field is determined by the density distribution of the Earth in the subsurface. Because the lithology of the Earth changes quite significantly, we are able to observe minute differences in the Earth's gravitational field on the surface of the Earth. From our understanding of gravity (Newton's Law of Gravitation), we know that the mathematical expression for gravity is: where is a measure of the local gravitational acceleration, is the universal gravitational constant, is the local mass (density) of the rock in the subsurface and is the distance from the mass to the observation point.
By discretizing the above expression, we are able to relate the discrete data observations on the surface of the Earth to the discrete model parameters (density) in the subsurface that we wish to know more about. For example, consider the case where we have 5 measurements on the surface of the Earth. In this case, our data vector, d is a column vector of dimension (5x1). We also know that we only have five unknown masses in the subsurface (unrealistic but used to demonstrate the concept). Thus, we can construct the linear system relating the five unknown masses to the five data points as follows:
Now, we can see that the system has five equations, , with five unknowns, . To solve for the model parameters that fit our data, we might be able to invert the matrix to directly convert the measurements into our model parameters. For example:
However, not all square matrices are invertible ( is almost never invertible). This is because we are not guaranteed to have enough information to uniquely determine the solution to the given equations unless we have independent measurements (i.e. each measurement adds unique information to the system). It's important to note that in most physical systems, we do not ever have enough information to uniquely constrain our solutions because the observation matrix does not contain unique equations. From a linear algebra perspective, the matrix is rank deficient (i.e. has zero eigenvalues), meaning that is not invertible. Further, if we add additional observations to our matrix (i.e. more equations), then the matrix is no longer square. Even then, we're not guaranteed to have full-rank in the observation matrix. Therefore, most inverse problems are considered to be underdetermined, meaning that we do not have unique solutions to the inverse problem. If we have a full-rank system, then our solution may be unique. Overdetermined systems (more equations than unknowns) have other issues.
Because we cannot directly invert the observation matrix, we use methods from optimization to solve the inverse problem. To do so, we define a goal, also known as an objective function, for the inverse problem. The goal is a functional that measures how close the predicted data from the recovered model fits the observed data. In the case where we have perfect data (i.e. no noise) and perfect physical understanding (i.e. we know the physics) then the recovered model should fit the observed data perfectly. The standard objective function, , is usually of the form:
which represents the L-2 norm of the misfit between the observed data and the predicted data from the model. We use the L-2 norm here as a generic measurement of the distance between the predicted data and the observed data, but other norms are possible for use. The goal of the objective function is to minimize the difference between the predicted and observed data.
To minimize the objective function (i.e. solve the inverse problem) we compute the gradient of the objective function using the same rationale as we would to minimize a function of only one variable. The gradient of the objective function is:
Which simplifies to:
After rearrangement, this becomes:
This expression is known as the Normal Equations and gives us a possible solution to the inverse problem. This is equivalent to Ordinary Least Squares
Additionally, we usually know that our data has random variations caused by random noise, or worse yet coherent noise. In any case, errors in the observed data introduces errors in the recovered model parameters that we obtain by solving the inverse problem. To avoid these errors, we may want to constrain possible solutions to emphasize certain possible features in our models. This type of constraint is known as regularization.
One central example of a linear inverse problem is provided by a Fredholm first kind integral equation.
For sufficiently smooth the operator defined above is compact on reasonable Banach spaces such as Lp spaces. Even if the mapping is injective its inverse will not be continuous. (However, by the bounded inverse theorem, if the mapping is bijective, then the inverse will be bounded (i.e. continuous).) Thus small errors in the data are greatly amplified in the solution . In this sense the inverse problem of inferring from measured is ill-posed.
To obtain a numerical solution, the integral must be approximated using quadrature, and the data sampled at discrete points. The resulting system of linear equations will be ill-conditioned.
Another example is the inversion of the Radon transform. Here a function (for example of two variables) is deduced from its integrals along all possible lines. This is precisely the problem solved in image reconstruction for X-ray computerized tomography. Although from a theoretical point of view many linear inverse problems are well understood, problems involving the Radon transform and its generalisations still present many theoretical challenges with questions of sufficiency of data still unresolved. Such problems include incomplete data for the x-ray transform in three dimensions and problems involving the generalisation of the x-ray transform to tensor fields.
A final example related to the Riemann Hypothesis was given by Wu and Sprung, the idea is that in the Semiclassical (old) Quantum theory the inverse of the potential inside the Hamiltonian is proportional to the half-derivative of the eigenvalues (energies) counting function n(x)
An inherently more difficult family of inverse problems are collectively referred to as non-linear inverse problems.
Non-linear inverse problems have a more complex relationship between data and model, represented by the equation:
Here is a non-linear operator and cannot be separated to represent a linear mapping of the model parameters that form into the data. In such research, the first priority is to understand the structure of the problem and to give a theoretical answer to the three Hadamard questions (so that the problem is solved from the theoretical point of view). It is only later in a study that regularization and interpretation of the solution's (or solutions', depending upon conditions of uniqueness) dependence upon parameters and data/measurements (probabilistic ones or others) can be done. Hence the corresponding following sections do not really apply to these problems. Whereas linear inverse problems were completely solved from the theoretical point of view at the end of the nineteenth century, only one class of nonlinear inverse problems was so before 1970, that of inverse spectral and (one space dimension) inverse scattering problems, after the seminal work of the Russian mathematical school (Krein, Gelfand, Levitan, Marchenko). A large review of the results has been given by Chadan and Sabatier in their book "Inverse Problems of Quantum Scattering Theory" (two editions in English, one in Russian).
In this kind of problem, data are properties of the spectrum of a linear operator which describe the scattering. The spectrum is made of eigenvalues and eigenfunctions, forming together the "discrete spectrum", and generalizations, called the continuous spectrum. The very remarkable physical point is that scattering experiments give information only on the continuous spectrum, and that knowing its full spectrum is both necessary and sufficient in recovering the scattering operator. Hence we have invisible parameters, much more interesting than the null space which has a similar property in linear inverse problems. In addition, there are physical motions in which the spectrum of such an operator is conserved as a consequence of such motion. This phenomenon is governed by special nonlinear partial differential evolution equations, for example the Korteweg–de Vries equation. If the spectrum of the operator is reduced to one single eigenvalue, its corresponding motion is that of a single bump that propagates at constant velocity and without deformation, a solitary wave called a "soliton".
A perfect signal and its generalizations for the Korteweg–de Vries equation or other integrable nonlinear partial differential equations are of great interest, with many possible applications. This area has been studied as a branch of mathematical physics since the 1970s. Nonlinear inverse problems are also currently studied in many fields of applied science (acoustics, mechanics, quantum mechanics, electromagnetic scattering - in particular radar soundings, seismic soundings and nearly all imaging modalities).
Inverse problems are typically ill posed, as opposed to the well-posed problems more typical when modeling physical situations where the model parameters or material properties are known. Of the three conditions for a well-posed problem suggested by Jacques Hadamard (existence, uniqueness, stability of the solution or solutions) the condition of stability is most often violated. In the sense of functional analysis, the inverse problem is represented by a mapping between metric spaces. While inverse problems are often formulated in infinite dimensional spaces, limitations to a finite number of measurements, and the practical consideration of recovering only a finite number of unknown parameters, may lead to the problems being recast in discrete form. In this case the inverse problem will typically be ill-conditioned. In these cases, regularization may be used to introduce mild assumptions on the solution and prevent overfitting. Many instances of regularized inverse problems can be interpreted as special cases of Bayesian inference.
There are four main academic journals covering inverse problems in general.
In addition there are many journals on medical imaging, geophysics, non-destructive testing etc. that are dominated by inverse problems in those areas.